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THE CLUSTERING OF MASSIVE GALAXIES AT z ~ 0.5 FROM THE FIRST SEMESTER OF BOSS DATA 
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ABSTRACT 

We calculate the real- and redshift-space clustering of massive galaxies at z ~ 0.5 using the first semester of 
data by the Baryon Oscillation Spectroscopic Survey (BOSS). We study the correlation functions of a sample of 
44,000 massive galaxies in the redshift range 0.4 < z < 0.7. We present a halo-occupation distribution modeling 
of the clustering results and discuss the implications for the manner in which massive galaxies at z ~ 0.5 
occupy dark matter halos. The majority of our galaxies are central galaxies living in halos of mass 10 13 /t'Mq, 
but 10% are satellites living in halos 10 times more massive. These results are broadly in agreement with 
earlier investigations of massive galaxies at z ~ 0.5. The inferred large-scale bias (b ~ 2) and relatively high 
number density (n = 3 x \0~ 4 h 3 Mpc~ 3 ) imply that BOSS galaxies are excellent tracers of large-scale structure, 
suggesting BOSS will enable a wide range of investigations on the distance scale, the growth of large-scale 
structure, massive galaxy evolution and other topics. 
Subject headings: cosmology: large-scale structure of universe 



1. INTRODUCTION 

The distribution of objects in the Universe displays a high 
degree of organization, which in current models is due to 
primordial fluctuations in density which were laid down at 
very early times and amplified by the process of gravitational 
instability. Characterizing the evolution of this large-scale 
structure is a central theme of cosmology and astrophysics. 
In addition to allowing us to understand the structure itself, 
large-scale structure studies offer an incisive tool for prob- 
ing cosmology and particle physics and sets the context for 
our modern understanding of galax y formation and evolu - 
tion. Since the pioneering st u dies of iHu mason et al. ( 1956); 
iGregory & Thompsonl £[978); J oeveer & Einasto (1978) and 
the first CfA redshift survey (Huch ra et al.ll 19831) . galaxy red- 
shift surveys have played a key role in this enterprise, and 
ever larger surveys have provided increasing insight and ever 
tighter constraints on cosmological models. 

This paper presents the first measurements of the cluster- 
ing of massive galaxie s from the Baryon Os cillation Spectro- 
scopic Survey (BOSS; Schleg el et al.ll2009l) based on a sam- 
ple of galaxy redshifts observed during the period January 
through July 2010. We demonstrate that BOSS is efficiently 



obtaining redshifts of some of the most luminous galaxies at 
z ~ 0.5, and has already become the largest such redshift sur- 
vey ever undertaken. The high bias and number density of 
these objects (described below) make them ideal tracers of 
large-scale structure, and suggest that BOSS will make a sig- 
nificant impact on many science questions including a deter- 
mination of the cosmic distance scale, the growth of structure 
and the evolution of massive galaxies. 

The outline of the paper is as follows. In ^2] we briefly 
describe the BOSS survey and observations, and define the 
sample we focus on in this paper. Our clustering results are 
described in $3] an d interpreted in the framework of the halo 
model in §|4] where we also compare to previous work on the 
clustering of massive galaxies at intermediate redshift. We 
conclude with a discussion of the implications of these results 
in $5] while some technical details on the construction of our 
mock catalogs are relegated to an Appendix. Throughout this 
paper when measuring distances we refer to comoving sep- 
arations, measured in /r'Mpc with Hq = 100/ikms" 1 Mpc -1 . 
We convert redshifts to distances, assuming a ACDM cos- 
mology with fl m = 0.274, fl A = 0.726 and h = 0.70. This is 
the same cosmology as assumed for the N-body simulations 
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FIG. 1 . — The (comoving) number density of galaxies, il(z) for the sample 
described in the text (5[2). The vertical dashed lines indicate the redshift limits 
we use in our analysis: 0.4 < z < 0.7. 

from which we make our mock catalogs (see Appendix Q. 

2. OBSERVATIONS 

The Sloan Digital Sky Survey (SDSS; lYork et al.1 l2000h 
mapped nearly a quarter of the sky using the dedi cated Sloan 
Foundation 2.5 m telescope dGunn et al.l 120061) located at 
Apache Point Obser vatory in New Me xico. A drift-scanning 
mosaic CCD camera (Gu nn et al.l 1998b imaged t h e sky in five 
photometric bandpass es (Fukugit a et al.l Il996t ISmith et all 
120021: iDoi et al] 1201 Of) to a limiting magnitude of r ~ 22.5. 
The imaging data were processed thr ough a series of p ipelines 
that perform astro metric calibration ([Pier et alJl2003l) . photo- 
metric reduction dLupton etal] 120011) . and photometric cali- 
bration ( Padmanabh an et alj2008l) . The magnit udes were cor- 
rected for Galactic extinction using the maps of Schlegel et al. 
( 1998). BOSS, a part of the SDSS-III survey (Eisenstein et al., 
in prep.) has completed an additional 3,000 square degrees 
of imaging in the southern Galactic cap, taken in a manner 
identical to the original SDSS imaging. All of the data have 
been processed through the latest versions of the pipelines and 
BOSS is obtaining spectra of a selected subset (Padmanabhan 
et al., in preparation) of 1.5 million galaxies approximately 
volume limited to z — 0.6 (in addition to spectra of 150,000 
quasars and various ancillary observations). The targets are 
assign ed to tiles of diamet er 3° using an adaptive tiling algo- 
rithm (Bla nton et al.ll2003h . Aluminum plates are drilled with 
holes corresponding to the positions of objects on each tile, 
and manually plugged with optical fibers that feed a pair of 
double spectrographs. These spectrographs are significantly 
upgraded from those used by SDSS-I/II (I York et alJ 120001: 
Stoug hton et al J 120021) . with improved chips with better red 
response, higher throughput gratings, 1,000 fibers (instead of 
640) and a 2" entrance aperture (was 3"). The spectra cover 
the range 3,600 A to 10,000 A, at a resolution of about 2,000. 

BOSS makes use of luminous galaxies selected from the 
multi-color SDSS imaging to probe large-scale structure at 
intermediate redshift (z < 0.7). These galaxies are among the 
most luminous galaxies in the universe and trace a large cos- 
mological volume while having high enough number density 
to ensure that shot-noise is not a dominant contributor to the 
clustering variance. The majority of the galaxies have old stel- 
lar systems whose prominent 4, 000 A break in their spectral 
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FIG. 2. — The distribution of absolute magnitudes for the sample analyzed 
in this paper. We have k+e corrected the r-band magnitudes to z — 0.55 using 
the g—i color assuming a passively evolving galaxy - since the redshift range 
is small this amounts to a small correction. This sample consists of intrinsi- 
cally very bright, and massive, galaxies with stellar masses several times the 
charac teristic mass in a Schechter fit. The luminosity function of Fa ber et all 
1 2007) at z = 0.5, converted from B to r band assuming a redshifted z = el- 
liptical galaxy template, has a characteristic luminosity of —19.8. Converting 
the Bell et al. (2004) luminosity function using a high-z single burst model 
gives —20. So all of the CMASS galaxies are brighter than this characteristic 
luminosity. 

energy distributions makes them relatively easy to select in 
multi-color data. 

The strategy behind, and details of, our target selection are 
covered in detail in Padmanabhan et al. (in preparation). Cuts 
in color-magnitude space allow a roughly volume-limited 
sample of luminous galaxies to be selected, and partitioned 
into broad redshift b ins. Briefly, we follow the SDSS-I/II pro- 
cedure described in Eisenstein et al] d2001l) and define a "ro- 
tated" combination of colors d± = {r-i)-{g-r)/%. The sam- 
ple we analyze in this paper (the so-called "CMASS sample" 
since it is approximately stellar mass limited) is defined via 

d± > 0.55 and i < 19.9 and ! n ber2<21.5 m 
i< 19.86+ 1.6(^-0.8) and r-i<2 (L) 

where magnitude cuts use "cmodel magnitudes" and col- 
ors are defined with "model magnitudes", except for /fiber2 
which is the magnitude in the 2" spectro scopic fiber (see 
IStoughton et al.l2002tlAbazaiian et al.l2004 for definitions of 
the magnitudes and further discussion). There are two addi- 
tional cuts to reduce stellar contamination, z ps f-z > 9.125- 
0.45z and r psf -r> 0.3. 

These cuts isolate the z ~ 0.5, high mass galaxies. The 
i — d± constraint is approximately a cut in absolute magni- 
tude or stellar mass, with d± closely tracking redshift for 
these galaxies. As discussed in detail in Padmanabhan et 
al. (in preparation), the slope of the i-d± cut is set to par- 
allel the track of a passively evolving, constant stellar mass 
ga laxy as determined fro m the population synthesis models 
of iMaraston et al.l (|2009). This approach leads to an approx- 
imately stellar mass limited sample. We restrict ourselves to 
galaxies in the redshift range 0.4 < z < 0.7 (Fig. [TJ. Note 
that our selection gives the majority of the galaxies within 
Az = 0.1 of the median - this has the advantage of making 
the analysis relatively straightforward but means we need to 
combine with other samples to obtain leverage in redshift. A 
comparison of the cuts defining this sample with other, simi- 
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FIG. 3. — (Top) The sky coverage of the galaxies used in this analysis, 
in orthographic projection centered on 072000 = 180° and 5/2000 = 0°. The 
regions A, B and C described in the text are marked. (Bottom) A zoom in 
of region A with the greyscale showing completeness. This region is the 
most contiguous of the three, and region B is the least contiguous owing to 
hardware problems in the early part of the year. 

lar, samples in the literature will be presented in Padmanab- 
han et al. (in prep.). In general BOSS goes both fainter and 
bluer than the earlier samples, targeting "luminous galaxies" 
not "luminous red galaxies". 

The distribution of absolute (r-band) magnitude for the 
sample is shown in Fig. [2] where we see that all of the CMASS 
galaxies are intrinsically very luminous. Using the modeling 
of Maraston et al. (in preparation) on the BOSS spectra we 
find the median stellar mass of the sample is 10 11 7 M . While 
the detailed numbers depend on assumptions about e.g. the 
initial mass function, these galaxies are at the very high mass 
end of the stellar mass function at this redshift for any reason- 
able assumptions. 

The clustering measurements in this paper are based on the 
data taken by BOSS up to end of July 2010, which includes 
120,000 galaxies over 1,600 deg 2 of sky. However, the data 
prior to January 2010 were taken in commissioning mode and 
little of those data are of survey quality. Once we trim the data 
to contiguous regions (Fig. [3} with high redshift completeness 
and select galaxies at z ~ 0.5 we are left with 44,000 galax- 
ies, covering 580 square degrees, which we have used in our 
analysis. 

The sky coverage of our sample can be seen in Figure [3] 
We view the data as comprising three regions of the sky, here- 
after referred to as A, B and C (see Figure). Galaxies in these 
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FIG. 4. — Contours of the redshift-space correlation function, £(i?,Z), for 
our 0.4 < z < 0.7 galaxy sample (see text). Note the characteristic elongation 
in the Z direction at small R (fingers-of-god) and squashing at large R (super- 
cluster infall). The upper panel shows the results from the BOSS data, while 
the lower panel is from our mock catalogs. The level of agreement is quite 
good, as can be seen more quantitatively in later figures. 



regions are separated from those in any other region by sev- 
eral hundred Mpc, and we shall consider them independent. 
Convenient "rectangular" boundaries to the regions are 



A: 105° < a/2000 < 135° 
B : 125° < a/2000 < 240° 
C : 185° < ^2000 < 255° 



25° < ^2000 < 60° (2) 
-5° < (5/2000 < 5° (3) 
10° < 5/2000 < 45° . (4) 



These boundaries yield widths (heights) of 600 (700), 2600 
(270) and 1600 (800) A _1 Mpc respectively at z ~ 0.5. As we 
shall discuss below, the data are consistent with having the 
same clustering and redshift distribution in all three regions. 

3. CLUSTERING MEASURES 
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We compute several two-point, configuration-space cluster- 
ing statistics in this paper. The basis for all of these calcula- 
tions is the two-point galaxy correlation function on a two- 
dimensional grid of pair separations parallel and perpendicu- 
lar to the line-of-sight: £(R,Z). 

To estimate the counts expected for unclustered objects 
while accounting for the complex survey geometry, we gen- 
erate random catalogs with the detailed radial and angular se- 
lection functions of the sample but with 50 x the number of 
points. Numerous tests have confirmed that the survey selec- 
tion function factorizes into an angular and a redshift piece. 
The redshift selection function can be taken into account by 
distributing the randoms according to the observed redshift 
distribution of the sample. The completeness on the sky is 
determined from the fraction of target galaxies in a sector 
for which we obtained a high-quality redshift, with the sec- 
tors being areas o f the sky covered by a unique set of spec - 
troscopic tiles (see Blanton et al. 2003; Teg mark et al.l l2004). 
We use the MANGLE software (ISwanson et al.ll2008l) to track 
the angular completeness. In computing the redshift com- 
pleteness we omit galaxies for which a redshift was already 
known from an earlier survey from both the target and suc- 
cess lists, and then later randomly sample such galaxies with 
the resulting completeness in constructing the input catalog. 
Since very few of our targets at z ~ 0.5 have existing red- 
shifts this is a very small correction. Not all of the spectra 
taken resulted in a reliable redshift, and the failure probability 
has angular structure due to hardware limitations. These re- 
sult in spatial signal-to-noise fluctuations in observations. We 
find no evidence that this failure is redshift dependent - low 
and high redshift failure regions have the same redshift dis- 
tribution. We therefore apply a small angular correction for 
this spatial structure by up-weighting galaxies based on the 
signal-to-noise of each spectrum, and the probability of red- 
shift measurement. This is a small correction and only affects 
our results at the percent level. To avoid issues arising from 
small-number statistics we only keep sectors with area larger 
than 10~ 4 sr, or approximately 0.3 sq. deg. At the observed 
mean density (150 deg -2 ) we expect several tens of galaxies 
in any such region, enabling us to reliably determine the red- 
shift completeness We trim the final area to all sectors with 
completeness greater than 75%, producing our final sample 
of 44,000 galaxies, distributed as 5,000 in region A, 14,000 
in region B and 24,000 in region C. After the cut the median, 
galaxy-weighted completeness is 88%, 84% and 88% in re- 
gions A, B and C respectively. 

We estimate £(/?,Z) using the lLandv & Szalavl (fl993l) esti- 
mator 



to reduce the variance of the estimator. The latter was 

1 



Wi = 



DD-2DR + RR 
RR 



(5) 



where DD, DR and RR are suitably normalized numbers of 
(weighted) data-data, data-random and random-random pairs 
in bins of (R,Z). We experimented with two sets of weights, 
one to correct for fiber collisions (described below) and one 



1 Assuming binomial statistics, if M of N galaxies have redshifts the most 
likely completeness isc = M/N, the mean is (M+ l)/(N+2) and the variance 
one is [M(N-M)+N+l]/[{N+2) 2 {N+3)]. For example if N = 12 and M = 9 
the error on c is approximately 10%. For N > 100 the error is under 5%. Un- 
less the scatter is somehow correlated with the signal these uncertainties are 
negligible. In fact, we find that ignoring the exact value of the completeness 
in constructing our random catalog only slightly alters our final £. 



l+n(zdtv(s) 



(6) 



where «(z,) is the mean density at redshift Zi and £y is a model 
for the volume-integrated redshift-space correlation function 
within s. We approximated £y = Atts^s, corresponding to 
£(s) = (s/sq)~ 2 and took Sq = 8/r'Mpc. The details of the 
weighting scheme did not affect our final result on the scales 
of interest to us here - in fact dropping this weight altogether 
gave comparable results and so we neglect this weight in what 
follows. 

We are unable to obtain redshifts for approximately 7% of 
the galaxies due to fiber collisions - no two fibers on any 
given observation can be placed closer than 62". At z — 0.5 
the 62" exclusion corresponds to 0.4/i _1 Mpc. Where possi- 
ble we obtain redshifts for the collided galaxies in regions 
where plates overlap, but the remaining exclusion must be ac- 
count for. We correct for the impact of this by (a) restricting 
our analysis to relatively large scales and (b) up-weighting 
galaxy-galaxy pairs in the analysis with angular separations 
smaller than 62". The weight is derived by comparing the 
angular correlation function of the entire photometric sam- 
ple with that of the galaxies for which we obtained re dshifts 
(lHawkins et afll27)03i: IIJetani2006l lRolsltani2007l) . This 
ratio is very close to unity above 62" but significantly de- 
pressed below this scale. Note that in our situation there is 
a close correspondence between angular separation and trans- 
verse separation since our survey volume is a relatively nar- 
row shell with reasonably large radius, so the number of pairs 
for which this correction is appreciable is quite small. 

Contours of the 2D correlation function for our 0.4 < z < 
0.7 galaxy sample are shown in Figure[4] Note the character- 
istic elongation in the Z direction at small R (fingers-of-god) 
and squashing at large R (super-cluster infall). 

To mitigate the effects of redshift space distortions we fol- 
low standard practice an d compute from £(R,Z) the projected 
correlation function (e.g. lDavis & P eebles 1983) 



(7) 



w p (R) = 2 dZi(R,Z) 



In practice we integrate to 100/r'Mpc, which is sufficiently 
large to include almost all correlated pairs. We also compute 
the angularly averaged, redshift space correlation function, 
£(s), and the cross-correlation between the CMASS sample 
selected from the imaging and the spectroscopic samples, w x ■ 
For all of these measures the full covariance matrix is com- 
puted from a set of mock catalogs based on a halo-occupation 
distribution (HOD) modeling of the data ($4] and Appendix 

We now discuss each of the clustering measurements in 
turn, beginning with the real-space clustering. 

3.1. Real-space clustering 

The projected correlation function for the 0.4 < z < 0.7 
sample is shown in Figure|5] We chose 8 bins, equally spaced 
in InR between 0.3/i _1 Mpc and 30/i _1 Mpc as a compromise 
between retaining the relevant information and generating sta- 
ble covariance matrices via Monte-Carlo. The finite width of 
these bins should be borne in mind when comparing theoret- 
ical models to these data. The integration over Z in Eq. (0I 
was done by Riemann sum using 100 linearly spaced bins in 
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TABLE 1 

The projected correlation function data and covariance matrix, for 8 equally spaced bins in In/?. Both R and w p are measured in units of /r'Mpc, with R quoted 
at the bin mid-point. To reduce the condition number of the covariance matrix we quote means, errors and covariances on Rw„, which removes much of the run 

of w p with scale and makes the quoted data points more similar in magnitude. The error bars, <r,, from the diagonal of C, are broken out separately in the 3 rd row 
and the correlation matrix, Ctj/fotarj) is quoted in the lower part of the table. The full covariance matrix should be used in any fit, and the finite width of the R 

bins should be included in theoretical predictions. 
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FIG. 5. — The projected correlation function for the 0.4 < z< 0.7 sample in 
regions A, B and C (lines) and for the combined sample (points with errors). 
The errors on the individual samples have been suppressed for clarity. The 
data are combined using the full covariance matrix, but only the diagonal 
elements are plotted. The w p implied by a power-law correlation function 
of slope -1.8 and correlation length of 7.5/r'Mpc forms a reasonable fit to 
the data with 1 < R p < 10/r'Mpc but we do not plot it here for clarity. The 
(thick) long-dashed-dotted line shows the prediction of the best-fitting HOD 
model (5j4), which provides a reasonable fit on all scales plotted (recall the 
errors are correlated). 

Z. The results were well converged at this spacing, because 
of the "smearing" of the correlation function along the line- 
of-sight due to redshift space effects. The data were analyzed 
separately in each of regions A, B and C and then combined 
in a minimum variance manner: 

c -i w (to,) = [C (tt) ]"'w( Q) (8) 

a=A,B,C 

with 

a 

where w p °^ represents the vector of w p measurements from 
region a =A, B or C. Not surprisingly, the combined result is 
dominated by the results from region C. To reduce the condi- 
tion number of the covariance matrix, and the dynamic range 
in w p , we fit throughout to Rw p and quote the results in that 
form. The w p points are quite covariant, in part because the 



FIG. 6. — The projected correlation function of the high- and low-z samples 
(lines), split at the mid-point of the range, and of the full sample (points with 
errors), indicating that the clustering is evolving little and the sample can be 
analyzed in one wide redshift bin. 

integration in Eq. ^} introduces a large mixing of power at 
different R, thus use of the full covariance matrix is essential. 
The error bars on the individual w p °^ have been suppressed in 
the figure for clarity, and the square-root of the diagonal ele- 
ments of the covariance matrix are shown as error bars on the 
combined result. 

We also subdivided the redshift range into a low-z and high- 
Z half, splitting at z = 0.55, and found no statistically signifi- 
cant difference between the two samples (Figure|6j in the split 
samples the fiber collision correction is more uncertain, so the 
disagreement at the smallest R point is not very significant). 
This result motivates our decision to analyze the data in a sin- 
gle redshift slice. Slow evolution of the clustering is expected 
for a highly biased population such as our luminous galax- 
ies where the evolution of the bias approxi mately cancels the 
evolution of the dark matter clustering dFrvl fl996). 

Even with only the 8 data points in w p , deviations from a 
pure power-law correlation function are apparent. These can 
be traced to the non-power-law nature of the mass correlation 
function and the way in which the galaxies occupy dark matter 
halos - we will return to these issues in ^4] 

The calculation of errors in clustering measurements can be 
done in a number of different ways (see Norber get al.ll2.Q09T 
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for discussion). We first tried a bootstrap estimate, dividing 
the survey regions into 8-22, roughly equal ar ea "pixels" and 
sampling from these regions with replacement ( Ef ron & Gongl 
1983). Unfortunately the irregular geometry and relatively 
small sky coverage meant we were not able to obtain a covari- 
ance matrix which was stable against changes in the pixeliza- 
tion. We anticipate that as the survey progresses this tech- 
nique will become more robust. In the meantime, we com- 
puted the covariance from a series of mock catalogs derived 
from an iterative procedure using N-body simul ation s as de- 
scribed in Appendix lAl We will show in Figure IaTI that the 
distribution of \ 2 from our mock catalogs encompasses the 
value obtained for the data in regions A, B and C if both are 
computed using the mock-based covariance matrix and the 
best-fitting HOD model (@. This indicates that the measure- 
ments we obtain are completely consistent with being drawn 
from the underlying HOD model, given the finite number of 
galaxies and observing geometry. 

3.2. Redshift-space clustering 

The angle-averaged redshift space correlation function, 
for the 0.4 < z < 0.7 sample is shown in Figure|7] Again, 
the data were analyzed separately in each of regions A, B and 
C. The dot-dashed line shows the same power-law correlation 
function as described in Figure [5] while the solid line shows 
the predicted £(s) from the model that best fits the w p data 
(above). The enhanc ement of clus tering over the real- space 
result on large scales (Kaiser 1 9871 fo r a review seelHa milton 
1998 and for recent d evelopments see lPapai & Szapudill2008l: 
Shaw & Le wis 2008) is evident in the comparison of the data 
to the power-law. The good agreement between the data and 
the HOD-model below a few Mpc is indication that the satel- 
lite fraction in the model is close to that in the data and the rel- 
ative motions of the satellite galaxies are close to the motions 
of the dark matter within the parent halos (i.e. any velocity 
bias is small). The characteristic down-turn on scales smaller 
than a few Mpc is expected from virial motions within ha- 
los and the motion of halos themselves. The excess power of 
the HOD model compared to the data on scales of a few Mpc 
can be mitigated by increasing the degrees of freedom in the 
model, for example by dropping the assumption that central 
galaxies move with the mean halo velocity or follow the dark 
matter radial profile or allowing a modest amount of satellite 
velocity bias. 

On scales below tens of Mpc the violations of the distant ob- 
server approximation a re small, but on larger s cales they begin 
to become appreciable (Papai & Szapudi 2008) and should be 
included in any comparison between these data and a theoret- 
ical model (most noticeably for the higher multipoles). 

3.3. Cross-correlation 

Finally we consider w p computed from the cross- 
correlation of the imaging catalog with the spectroscopy - 
this allows us to isolate the galaxies to a narrow redshift shell 
and convert angles to (transverse) distances while at the same 
time being insensitive to the details of the spectroscopic se- 
le ction including the issue o f fiber collisiono As described 
in Padmanabha n et al.l (120091) . the angular cross correlation of 

2 One must still upweight some of the spectroscopic galaxies to account 
for the fact that fiber collisions occur more often in dense regions. This issue 
turns out to be a very small effect here, in part because BOSS is a deep survey 
and the correlation between 2D over-density on the sky and 3D over-density 
is washed out by projection. 
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FIG . 7 . — The redshift-space, isotropic correlation function for the 0.4 < z < 
0.7 sample in regions A, B and C (points). The same power-law correlation 
function which fits the w p data on intermediate scales, with sq = 7.5 /T 1 Mpc, 
is shown as the dot-dashed line while the solid line is the prediction for £ (s) 
from the best-fitting HOD model to w p , assuming no velocity bias for satel- 
lites and that central galaxies are at rest in their halos. The good agreement 
below a few Mpc is an indication that the satellite fraction in the model is 
close to that in the data and any velocity bias is small. 

the imaging and spectroscopic samples, with angles converted 
to distances using the redshift of the spectroscopic member, 
can be written as 

w x (R) = (fix)) w P (R) (10) 

where /(%) is the normalized radial distribution of the photo- 
metric sample as a function of comoving distance, x, and the 
average is over the redshift distribution of the spectroscopic 
sample. Note that w x (R) is dimensionless, with f(x) hav- 
ing dimensions of inverse length and w p having dimensions 
of length. 

Figure [8] shows the cross-correlation for regions A, B and 
C along with a power-law correlation function. The normal- 
ization of this figure differs from that of Figure [3] by a factor 
of (f(x)) ~ O(10~ 3 ). Because the signal is suppressed by the 
width of f(x) the estimate of w p from the cross-correlation 
is significantly n oisier than that from the auto-correlation (see 
Myers et all 120091 §2.1, for related discussion). The cross- 
correlation estimate is consistent with our auto-correlation re- 
sults but we have not attempted to fit any models to it directly. 
We have extended the cross-correlation to smaller scales in 
the Figure to emphasize that there is significant power even 
on very small scales, which are difficult to probe directly with 
the auto-correlation function due to the fiber collision prob- 
lem. 

4. HALO OCCUPATION MODELING 

In order to relate the observed clustering of galaxies with 
the clustering of the underlying mass, and to make realis- 
tic mock catalogs, we interpret our measur ements within the 
context of the halo o ccupation distr ibution ( Peacock & Smith! 
| 2000t ISeliakl l2000t iBenson et all 120001: IWhite et alJ 120011: 
Berlind & W einberg 20021: ICooray & Shethll2002l) . The halo 
occupation distribution describes the number and distribution 
of galaxies within dark matter halos. Since the clustering and 
space density of the latter are predictable functions of redshift, 
any HOD model makes predictions for a wide range of obser- 
vational statistics. Rather than perform a simultaneous fit to 
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FIG. 8. — The cross-correlation function, w x (R), of the sp ectr oscopic and 
photometric samples which is proportional to w p (R) (Eq. |10> . The dot- 
dashed line represents a power-law correlation function. Error bars have 
been suppressed to avoid obscuring the figure. Due to the small value of 
(fix)) ~ 0(1O~ 3 ) the error bars are significant, especially at large scales, 
and are roughly the difference between the plotted lines for regions A, B and 
C. 

the real- and redshift-space correlation functions (including 
their covariances) we choose to fit to the the real-space clus- 
tering only and show that the models which best fit these data 
also provide a reasonable description of the redshift-space 
clustering results. This avoids the need to make additional 
assumptions for modeling the redshift space correlation func- 
tion. We also implicitly assume that we are measuring a uni- 
form sample of galaxies across the entire redshift range, so 
that a single HOD makes sense. We tested this assumption by 
splitting the sample into high- and low-redshift subsamples. 

We use a halo model which distinguishes between central 
and satellite galaxies with the mean occupancy of halos: 



N(M) = (N g .JM h . dlo ))=N cen (M)+N s:it (M) 



(11) 



Each halo either hosts a central galaxy or does not, while the 
number of satellites is Poisson distributed with a mean 7Y sat . 
The mean number of central galaxies per halo is modeled 
witrQ 



1 

,(M) = - erfc 



ln(M cut /M) 
V2a 



and 



N s JM)=N a 



M-nM a 



(12) 



(13) 



for M > nM cut and zero otherwise. This form implicitly as- 
sumes that halos do not host satellite galaxies without hosting 
centrals, which is at best an approximation, but this is reason- 
able for the purposes of computing projected clustering. Dif- 
ferent functional forms have been proposed in the literature, 
but the current form is flexible enough for our purposes. 

To explore the plausible range of HOD parameter space 
we appli ed the Markov Ch ain Monte Carlo method (MCMC; 
e.g., see iGilks et al.|[l996h to the w p data using a x 2 -based 
likelihood. This method generates a "chain" of HOD pa- 
rameters whose frequency of appearance traces the likelihood 

3 Note that our definition of cr can be interpreted as a fractional "scatter" in 
mass at threshold but is a factor ln(10)/-\/2 different than that in Zheng et al. 
(2005). 



FIG. 9. — The mean occupancy of halos as a function of halo mass for our 
full sample. The shaded band indicates the ± 1 cr range determined from our 
Markov chain analysis. The dashed and dotted lines show the average A?cen 
and N sat , respectively. 

of that model fitting the data. It works by generating ran- 
dom HODs from a trial distribution (in our case a multi- 
dimensional Gaussian), populating a simulation cube with 
galaxies according to that HOD, computing w p from the pe- 
riodic box by pair counts and accepting or rejecting the HOD 
based on the relative likelihood of the fit. The step sizes and 
directions are determined from the covariance matrix of a pre- 
vious run of the chain. Given the chain, the probability dis- 
tribution of any statistic derivable from the parameters can 
easily be computed: we show the mean occupancy of halos 
as a function of mass, N(M), in Figure [9J where the band in- 
dicates the ±lcr spread within the chain. The mean (galaxy- 
weighted) halo mass is (Mi80i) g ai = (2.8±0.15)x 10 13 /T 1 M Q 
(we quote here the mass interior to a sphere within which the 
mean density is 180x the background density for halo mass, 
rather than the friends-of-friends mass, to facilitate compari- 
son with other work); while the satellite fraction is (10 ±2)%. 
The values of the HOD parameters are given in Table [2] 

In addition to the purely statistical errors, shown in the fig- 
ure and quoted above, there are systematic uncertainties. Our 
correction for fiber collisions only significantly impacts the 
smallest R point in our calculation. If we increase the er- 
ror on that point by a factor of 10, effectively removing it 
from the fit, the results change to (Migozjgai = (2.6 ±0.15) x 
10 13 /z _1 M Q and (7 ± 2)% respectively which are shifts of ap- 
proximately 1 cr. Additional uncertainty arises from the un- 
certainty in the background cosmology (held fixed in this pa- 
per) and from methodological choices. A comparison of dif- 
ferent methods for performing the halo modeling (using dif- 
ferent mass definitions or halo profiles, analytic vs. numeri- 
cal methods, different ways of enforcing halo exclusion, etc.) 
suggests an additional O(10%) "systematic" uncertainty. It 
would be interesting to check the assumptions going into this 
HOD analysis, and the inferences so derived, with additional 
data and a luminosity dependent modeling. 

The halo occupancy of massive galaxies at these redshifts 
has been investi g ated b efore based on bo t h photomet- 
ric dWhiteetalJ 120071: iBlakeetalj 
2008; Padmanab han et al J 120091) 
dRossetalJ 120071 l2008rlw ake et al. 
2009 UReid & Spergell 120091) samples. 



2008; iBrown et alj 
and s pectroscopic 
12001 iZheng et al] 
Accounting for dif- 
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lgM cu , 13.08 ±0.12 (13.04) 

lgMi 14.06±0.10 (14.05) 

a 0.98 ± 0.24 (0.94) 

re 1.13 ±0.38 (0.93) 

a 0.90 ±0.19 (0.97) 



TABLE 2 

Th e mean and standard deviation of the HOD parameters (see Eqs. ll2l and 
1131 from our Markov chain. The particular values for our best-fit model are 



given in parentheses. 
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FIG. 10. — The HOD parameters, M cllt and M|_ sat , as a function of num- 
ber density for a variety of intermediate redshift, massive galaxy samples 
from the literature (c.f. Fig. 12 of Brown et al. 2008). Here Mj_ sat is the 
halo mass which hosts, on average, one satellite which is easier to com- 
pare when d ifferent fun c tional f orms for N(M) are in use . The data are 
taken from Phleps et al. 12006), Mandelbaum et al. 12006), Kulkarni et al. 
(2007), Blake et al. (2008), Brown et al. (2008), Padmanabhan et al. (2009), 
Wake et al. (2008), Zheng et al. (2009) and this work, as noted in the legend. 
Error bars on the individual points have been suppressed for clarity, but are 
typically 0. 1 dex. The solid line in the lower panel shows the halo mass func- 
tion at z = 0.55 for comparison. The value of M cut for a sample of only central 
galaxies with no scatter between observable and halo mass would follow this 
line. 

ferences in sample selection and redshift range, our results 
appear quite consistent with the previous literature (see 

Our galaxies populate a broad range of halo masses, with 
an approximate power-law dependence of the mean number 
of galaxies per halo with halo mass for massive halos and a 
broad roll-off at lower halo masses. The low mass behav- 
ior is driven by the amplitude of the large-scale clustering in 
combination with the relatively high number density of our 
sample and encodes information about the scaling of the cen- 
tral galaxy luminosity with halo mass and its distribution. We 
find that the halos with masses (2-3) x lO 13 /z _1 M contain 
on average one of our massive galaxies. At these redshifts 
such halos are quite highly biased (see below), corresponding 
to galaxy groups, and we expect b(z) oc 1 /D(z), where D(z) is 
the linear growth rate, leading to an approximately constant 
clustering amplitude with redshift. 

The majority of our galaxies are central galaxies residing in 
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FIG . 11 . — The probability per log 10 Mh a i (upper) or cumulative probability 
(lower) that a galaxy in our sample is hosted by a halo of mass M^\ . Note the 
broad range of halo masses probed by our galaxies, and the low probability 
of finding one of our galaxies in very high mass halos (due to the sparsity of 
such halos at this redshift). 

10 13 hr l M Q halos, but a non-negligible fraction are satellites 
which live primarily in halos ~ 10 times more massive. The 
width of this "plateau" (Mi /M cut ) is smaller than that found 
for less luminous systems at lower r edshif t, though it contin- 
ues the trends seen in Zheng et al. (2009) for plateau width 
and satellite fraction as a function of luminosity. This in- 
crease in the satellite fraction is driving the visibility of the 
fingers-of-god in the correlation function (Figure |4| and the 
small-scale upturn in w p . 

An alternative view of the halo occupation is presented in 
Figure QT| which shows the probability that a galaxy in our 
sample is hosted by a halo of mass M. Note the broad range 
of halo masses probed by our galaxies, and the low probability 
of finding one of our galaxies in very high mass halos - which 
is a consequence of the sparsity of such halos at this redshift. 

The N-body simulations can also be used to infer the scale- 
dependence of the bias, b(r) = [^ ga i('")/Cdm('")] 1//2 , for our best- 
fitting halo model. This is shown in Figure [12] where we 
see that above 10-20/i _1 Mpc the bias approaches a constant, 
b ~ 2. For our cosmology the linear growth factor at z = 0.55 
is 0.762 so ba%(z = 0.55) =1.3 and this is assumed constant 
across our redshift range. This is very similar to the results 
obtain ed for photometric LRG samples at comparable red- 
shifts (IB lake et al.l2007t]Ross et al 120071: iPadmanabhan et all 
l2007Ll2009tlBlake et alJl2008h . The rapid rise of b(r) at very 
small scales is expected, since it is well known that these 
galaxies exhibit an almost power-law correlation function at 
small scales while the non-linear £dm(r) is predicted to fall 
below a power-law at small r. (Most galaxy pairs on these 
scales are central-satellite pairs, whereas for the dark matter 
there is no such distinction so £dm is the convolution of the 
halo radial profile with itself.) The feature in b(r) at a few 
Mpc occurs at the transition between the 1- and 2-halo contri- 
butions, i.e. pairs of galaxies that lie within a single dark mat- 
ter halo vs. those which lie in separate halos, while the rise 
at slightly larger scales comes from the scale-dependence of 
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FIG. 12. — The scale-dependence of the bias, b(r) = [£eal( r )/£dm(>')] 1 ' /2 . 
predicted from our best-fit halo model and N-body simulations. The feature 
at a few Mpc has been seen in other analyses and occurs at the transition be- 
tween the 1- and 2-halo contributions (see text). Note that the bias asymptotes 
to a constant, b ~ 2, on large scales. 

the halo bias. Note that the combination of the high clustering 
amplitude and number density makes this sample particularly 
powerful for probing large-scale structure at z — 0.5. 

Finally, using the best-fitting HOD model from the chain 
and a series of N-body simulations we generate mock catalogs 
as described in more detail in Appendix lAl These are passed 
through the observational masks and cuts in order to mimic 
the observations and can be analyzed in the same manner to 
generate a set of mock measurements from which we com- 
pute covariance matrices and other statistical quantities. We 
match the redshift distribution of the sample to our constant- 
number-density simulation boxes by randomly subsampling 
the galaxies as a function of redshift (with a 100% sampling 
at the peak near z — 0.55). This is consistent with our as- 
sumption, earlier, that the HOD describes a single population 
of objects and the dN/dz reflects observational selection ef- 
fects. We obtain similar HODs fitting to the thinner redshift 
slices which lends credence to this view. 

5. DISCUSSION 

The Baryon Spectroscopic Oscillation Survey is in the pro- 
cess of taking spectra for 1.5 million luminous galaxies and 
150,000 quasars to make a precision determination of the 
scale of baryon oscillations and to study the growth of struc- 
ture and the evolution of massive galaxies. We have presented 
measurements of the clustering of 44,000 massive galaxies at 
z ~ 0.5 from the first semester of BOSS data, showing that 



BOSS is performing well and that the galaxies we are target- 
ing h ave properties in line with expectations (Schleg el et alj 
2009). 

The CMASS sample at z — 0.5 has a large-scale bias of 
b ~ 2 (Fig. [12) , and a number density several times higher 
than th e earlier, spectroscopic LRG sample of Eisenstein et al. 
(120011) . making it an ideal sample for studying large-scale 
structure. The majority of our CMASS galaxies are central 
galaxies residing in 10 13 /t'Mq halos, but a non-negligible 
fraction are satellites which live primarily in halos ^10 times 
more massive. 

The data through July 2010 do not cover enough volume 
to robustly detect the acoustic peak in the correlation func- 
tion in this sample, one of the science goals of BOSS. While 
no definitive detection is possible at present, the error bars 
are anticipated to shrink rapidly as we collect more redshifts; 
BOSS should be able to constrain the acoustic scale at z ~ 0.5 
within the next year, with the constraints becoming increas- 
ingly tight as the survey progresses. 
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APPENDIX 

N-BODY SIMULATIONS AND MOCK CATALOGS 

We make use of several simulations in this paper. The main set is 20 different realizations of the ACDM family with VL„, = 0.274, 
ft\ = 0.726, h = 0.7, n = 0.95 and erg = 0.8 (in agreement with a wide array of observations). Briefly, each simulation employs 
an updated version of the TreePM code described in White] (12002b to evolve 1500 3 equal mass (7.6 x \0 w h~ x Mo) particles in 
a periodic cube of side length 1500/r'Mpc with a Plummer equivalent smoothing of 36/r'kpc. The initial conditions were 
generated by displacing particles from a regular grid using second order Lagrangian perturbation theory at z = 75 where the 
rms displacement is 10% of the mean inter-partic le spacing. This TreeP M code has been compared to a number of other codes 
and shown to perform well for such simulations (Heitmann et al. 2008). Recently the code has been modified to use a hybrid 
MPI+OpenMP approach which is particularly efficient for modern clusters. 

For each output we found dark matter halos using the Friends of Friends (FoF) algorithm dDavis et al.| [l985) with a linking 
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FIG. Al. — The distribution of \ 2 f° r w p from our mock catalogs (histograms) and from the data (vertical dashed lines) in regions A, B and C. The \ 2 is 
computed for the measured or mock w p compared to the best-fitting HOD model using the covariance matrix computed from the mocks. The measurements we 
obtain are completely consistent with being drawn from the underlying HOD model, given the finite number of galaxies and observing geometry. 

length of 0.168 times the mean interparticle spacing. This partitions the particles into equivalence classes roughly bounded by 
isodensity contours of 100 x the mean density. The position of the most-bound particle, the center of mass velocity and a random 
subset of the member particles are stored for each halo and used as input into the halo occupation distribution modeling and mock 
catalogs. Throughout we use the sum of the masses of the particles linked by the FoF algorithm as our basic definition of halo 
mass, except when quoting (Miso/>) ga i in $4] where we use spherical over-density (SO) masses to facilitate comparison with other 
work. Note we do not run a SO finder to define new groups. We use the FoF halo catalog, only computing a different mass for 
each FoF halo. In order to compute these SO masses we grow spheres outwards from the most bound particle in each FoF halo, 
stopping when the mean density of the enclosed material (including both halo and non-halo particles) is 180x the background 
density. The total enclosed mass we denote by Misoz,. 

All of the mock observational samples are assumed to be iso-redshift, and "static" outputs are used as input to the modeling. The 
assumption of non-evolving clustering over the relevant redshift range is theoretically expected for a highly biased population, 
and also borne out by our modeling (§@]i and measurements (§|3). 

Once a set of HOD parameter values has been chosen, we populate each halo in a given simulation with mock "galaxies". 
The HOD provides the probabilities that a halo will contain a central galaxy and the number of satellites. The central galaxy 
is placed at the position of the most bound particle in the halo, and we randomly draw dark matter particles to represent the 
satellites, assuming that the satellite galaxies trace the mass profile within halos. This approach has the advantage of retaining 
any alignments between the halo material, the filamentary large-scale structure and the velocity field. 

Sin ce the observational geometry is in some cases highly elongated (Fig. [3]), we use volume remapping ( Carls on & White! 
2010) on the periodic cubes to encompass many realizations of the sample within each box. The mock galaxies are then observed 
in a way analogous to the actual sample, with the completeness mask and redshift cuts applied to generate several hundred "mock 
surveys". (Overall we have 1500 mock surveys, divided into 900, 360 and 240 mock surveys of regions A, B and C respectively. 
However they are not all completely independent as we have only ^100 times as much volume in the simulations as in the largest 
region, C.) For technical reasons, and since it only affects the smallest scale w p point, we do not model fiber collisions. Instead 
we increase the errors for that point by the square root of the ratio of the pair counts in the photometric sample to that in the 
spectroscopic sample (i.e. the same correction applied to the data-data pairs in computing £(/?,Z)). This correction is appropriate 
in the limit that the error is dominated by Poisson counting statistics. 

The covariance matrices for the clustering statistics are obtained from the mocks, and the entire procedure (reconstructing 
the best-fit with the new covariance matrix, recomputing the mock catalogs and recomputing the clustering) is iterated until 
convergence. Given a reasonable starting HOD, the procedure converges within two or three steps. 

Over the range of scales probed in this paper the correlation function is quite well constrained and we find the distribution of 
w p values in the mocks is well fit by a Gaussian at each R. This suggests we are able to use a Gaussian form for the likelihood, 
which is backed up by the distribution of \ 2 values seen in Figure lATI 
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